PCA Rerandomization

نویسندگان

چکیده

Mahalanobis distance of covariate means between treatment and control groups is often adopted as a balance criterion when implementing rerandomization strategy. However, this may not work well for high-dimensional cases because it balances all orthogonalized covariates equally. We propose using principal component analysis (PCA) to identify proper subspaces in which should be calculated. Not only can PCA effectively reduce the dimensionality covariates, but also provides computational simplicity by focusing on top orthogonal components. The scheme has desirable theoretical properties balancing thereby improving estimation average effects. This conclusion supported numerical studies both simulated real examples. La de entre les moyennes des covariables groupes traités et non est souvent utilisée comme critère d'équilibre lors la mise en œuvre d'une stratégie re-randomisation. Cela dit, ce peut ne pas fonctionner correctement pour cas à grande dimension car il équilibre toutes orthogonalisées manière égale. Les auteurs travail proposent recourir l'analyse composantes principales (ACP) afin d'identifier sous-espaces appropriés dans lesquels devrait être calculée. L'ACP seulement réduire efficacement dimensionnalité dimension, mais elle offre également une simplicité calcul se concentrant sur orthogonales plus importantes. Ce schéma re-randomisation basé l'ACP possède avantages théoriques intéressants équilibrer et, par conséquent, améliorer l'estimation effets moyens du traitement. appuient leur études numériques utilisant fois simulations exemples concrets. Randomized experiments have long been regarded gold standard measure effect an intervention, randomization potential bias estimates distributions average. pure (complete) implemented practice, yields unbalanced allocations, so that rerandomized before experiment actually conducted. Although discussed earlier (Fisher, 1926; Cox, 2009; Worrall, 2010), its formal framework was established until publication Morgan & Rubin (2012). Using treatment–control experiments, shown improve precision estimated Following (2012), effort made extend or modify such schemes. For example, (2015) proposed strategy with different tiers anticipated importance respect outcome variable. extension 2K factorial design developed Branson, Dasgupta (2016) based example educational data. Zhou al. (2018) considered sequentially enrolled units. Li, Ding (2018, 2020) investigated asymptotic estimator settings designs, respectively. Li (2020) further combination regression adjustment. Wang, Wang Liu (2021) studied statistical stratification rerandomization. Zhang Yin incorporated response information into ethical concerns clinical trials. Yang, Qu survey theories. All aforementioned methods use measure, due several appealing characteristics. First, invariant any affine transformation original covariates. Second, (2012) showed preserve unbiasedness equal-sized groups, reduces equal percent sampling variance each covariate. Apart from rerandomization, widely applied matching observational (Rubin, 1973a,b, 1979, 1980; Rosenbaum Rubin, 1985; Stuart, 2010). Despite advantages above, full-rank data (Branson Shao, 2021), difficult equally large number magnitudes variances. In related work, hierarchically prespecified outcome. Branson Shao pointed out might specify relative priori. They including ridge term distance, puts more emphasis components space after (PCA). relies complicated Monte Carlo integration constraint optimization determine value ridging parameter. Rather than Johansson Schultzberg rank-based their heuristic metric designed longitudinal where pre-experimental outcomes are available estimate Moreover, yet under metric. PCA, we calculate associated subspace then perform Our viewed lower dimensional alternative distance. Because selected components, imposes shrinkage them given same acceptance probability. orthogonality simplifies covariance matrix diagonal thus improves efficiency calculating establish theory distribution modified reduction mean differences compared complete randomization. Practically, despite our method easy implement delivers performance without cumbersome parameter specification increased computation required Section 2, review (Morgan 2012). present details 3. 4 reports results show other 5 concludes discussion. Given k ak, derived defer corresponding technical Appendix. average, additionally leads unbiased τ. Theorem 1.Given constant ak>0, According definition Mk, allocations W 1n−W threshold ak>0. Assuming Equation (1) holds ◂◽.▸x‾T−◂◽.▸x‾C τ^ follows 2.1 Corollary 2.2 1 extended unobserved implied addition removing conditional bias, tends make difference concentrated. literature 2012, 2015; adopt normal approximation ◂,▸(◂◽.▸x‾T−◂◽.▸x‾C)|◂∼▸X∼

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Randomization , Rerandomization and Matching in Clinical Trials

Randomization was a key contribution of Sir Ronald Fisher to the conduct of scientific investigations. Along with the protective aspects of randomization, Fisher also noted that the distribution induced by randomization can form the basis of inference. Indeed, in some instances, the randomization test and related procedures seem to be the only tools available for inference. Several authors have...

متن کامل

EM Algorithms for PCA and Sensible PCA

متن کامل

Smart PCA

PCA can be smarter and makes more sensible projections. In this paper, we propose smart PCA, an extension to standard PCA to regularize and incorporate external knowledge into model estimation. Based on the probabilistic interpretation of PCA, the inverse Wishart distribution can be used as the informative conjugate prior for the population covariance, and useful knowledge is carried by the pri...

متن کامل

Circular PCA

Experimental time courses often reveal a nonlinear behaviour. Analysing these nonlinearities is even more challenging when the observed phenomenon is cyclic or oscillatory. This means, in general, that the data describe a circular trajectory which is caused by periodic gene regulation. Nonlinear PCA (NLPCA) is used to approximate this trajectory by a curve referred to as nonlinear component. Wh...

متن کامل

PCA Notes

1.1 Power iteration We would like to recover the largest singular value σ 1 , and corresponding singular vectors u 1 and v 1. To simplify the discussion, assume that σ 2 < σ 1 We consider the following optimization problem min p∈R n ,q∈R m A − pq T 2 F s.t.p 2 = 1 Let f (p, q) = A − pq T 2 F. The optimizers are shown to be (p * , q *) = (u 1 , σ 1 v 1) (up to a sign change). This problem can be...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Canadian journal of statistics

سال: 2023

ISSN: ['0319-5724', '1708-945X']

DOI: https://doi.org/10.1002/cjs.11765